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Radix  16  Evaluation  of  Some 
Elementary  Functions 


"by  MLlos  D.  Ercegovac* 

Abstract 

This  paper  describes  an  approach  for  obtaining  a  class  of  similar 
algorithms  for  evaluation  of  some  elementary  functions.  The  main  objective 
is  to  show  feasibility  of  higher  radix  implementations,  in  particular  radix 
l6,  as  they  of f er  better  performance  than  radix  2.  The  emphasis  is  not  on 
optimality  of  a  single  algorithm  but  rather  on  the  optimality  of  the  whole 
class  of  algorithms.  An  attempt  to  implement  a  much  wider  class  of  functions 
than  is  done  in  conventional  arithmetic  units,  would  be  encouraged  by  the 
present  level  of  technology  and  by  the  existence  of  suitable  algorithms. 
Besides  the  definitions  of  the  algorithms,  which  are  based  on  continued 
products  (sums),  some  details  related  to  implementation  are  discussed. 
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1.   Introduction 

The  use  of  continued  products  in  the  calculation  of  some  elementary 
functions  appears  as  early  as  1959,  in  Voider' s  CORDIC  technique  [5].   The 
main  results  of  this  approach,  based  on  coordinate  system  transformations, 
have  been  recently  summarized  in  the  form  of  a  unified  algorithm  by 
Walther  [6].  Without  using  the  notion  of  coordinate  rotation,  Specker  [7] 
derived  a  class  of  algorithms  using  the  concept  of  continued  products. 
DeLugish  [1]  has  defined  powerful  algorithms  for  a  wide  class  of  elementary 
functions.   Since  the  subject  of  this  paper  is  motivated  by  some  ideas  and 
is  based  upon  results  obtained  by  DeLugish,  a  brief  overview  of  some  of 
his  ideas  and  results  follows. 

The  available  technological  possibilities  could  justify  hardware 
implementation  of  a  wide  class  of  functions  without  essentialy  affecting 
the  relative  cost  of  the  arithmetic  unit,  given  similarity  of  the  proposed 
algorithms . 

One  effective  way  to  derive  such  a  class  of  algorithms  is  to  use 
continued  products  (CP)  or  continued  sums  (CS)  during  function  evaluation. 
The  CP's  or  CS's  are  closely  related  to  the  process  of  normalization.   The 
main  idea  is  then  to  replace  each  required  operation  by  two  processes, 
which  utilize  only  the  simplest  operations:  addition/ subtraction,  shifting 
and  possibly  access  to  precomputed  constants  in  a  read-only  memory. 

One  of  the  processes  is  normalization,  iteratively  performed  in 
one  arithmetic  unit  giving  as  a  result  digits  of  the  CP  (CS)  representation, 
one  at  a  time.  Using  these  digits,  the  second  arithmetic  unit  performs 
result  evaluation  at  the  same  time.   The  apparent  disadvantage  of  this 
approach,  namely  use  of  at  least  two  separate  arithmetic  units,  can  be 
eliminated  without  affecting  the  speed  significantly,  as  we  propose  later. 


Another  remark  applies  to  the  fact  that  digit -by- digit  evaluation  is  not  a 
consequence  of  inherent  properties,  but  reflects  realization  strategy,  which 
attempts  to  achieve  reasonably  fast  implementation  for  all  functions  under 
consideration,  retaining  at  the  same  time  simplicity. 

The  use  of  redundant  representations  [3]  results  in  simpler  selection 
procedures,  which  are  essential  in  normalization,  and,  in  the  case  of  radix 
2,  increases  speed  by  increasing  the  probability  of  a  zero. 

DeLugish  has  shown  that  for  a  class  of  functions,  which  includes 
division,  multiplication,  square  root,  logarithm,  exponential,  trigonometric 
and  inverse  trigonometric  functions,  operation  times  are  from  1  to  3 
multiplication  cycle  times. 

For  the  radix  l6  case,  we  will  present  algorithms  without  complete 
derivations,  which  are  given  in  [9]« 

In  what  follows,  we  consider  only  algorithms  for  fractional  parts 
of  floating  point  numbers,  containing  m  radix  16  digits  and  sign,  assuming 
that  exponent  arithmetic  can  be  done  in  a  conventional  manner.  Also,  only 
positive  initial  values  are  considered. 

By  normalization  of  a  given  number  X  e[l/2, l)  we  mean  a  sequence  of 
transformation  such  that  either 

X  .tt(Ml)  -  1  (1.1) 

o 

i=o 
or 


X  -  S(Ai)  -  0  (1.2) 


o 

i=o 


where  multipliers  are  of  the  form 


M=  1  +  l6"kS  ,  o  <  k  <  m; 


and 


summands  are  of  the  form 

A^  =  l6"kSk,    o  <  k  <  m; 

and  S  e{10,  9,    . ..,  1,  0,  1,  . ..,  9>  10}  are  step  dependent  constants, 
(l.l)  is  called  multiplicative  normalization  and  uses  continued  products 
while  (1.2)  is  additive  normalization  using  continued  sums.  Then  constants 
S  are,  in  other  words,  the  digits  of  the  corresponding  CP  (CS)  representation. 
As  defined  by  Robertson  in  [2],  the  choice  of  the  set  of  values  for  S 
corresponds  to  the  redundancy  ratio  of  2/3,  allowing  efficient  multiple 
formation  as  well  as  low  precision  selection  rules. 

2.  Multiplicative  Normalization 

The  multiplicative  normalization  is  performed  recursively  as 

X. +1  =  X,(l+l6~kS  ),    o  <  k  <  m      (2.1) 

so  that  X  ,  =  X  ir  (l+l6~1S.)  is  the  normalized  X  .   S.  can  he  determined 
m+1    0.1  ok 

1=0 

on  the  basis  of  X ,  but  to  keep  selection  dependent  on  the  same  register 
positions,  it  is  convenient  to  define  the  scaled  remainder  as 

P^  =  l6k"1(Xk-l),    o  <  k  <  m       (2.2) 
The  recursion  now  becomes 

R^+1  =  161^  +  Sk  +  l6"k+1SkRk,   o  <  k  <  m   (2.3) 

Since  |S      =10,  the  selection  rules  should  preserve  bounds  of  scaled 
1  kmax 

remainders,  namely  -2/3  <  K    <   2/3.   This  will  guarantee  that  the  error  of 

normalization  |E  L.  I  =  1 1-X  .  \   <  2/3  l6"m  and  that  for  all  k  there  exist 
1  m+1       m+1  — 


intervals  of  R  for  each  of  which  a  particular  Sfc  is  a  valid  choice. 


Those 


intervals  should  be  overlapped  (due  to  redundancy)  so  that  the  continuity  of 
representation  is  preserved.   This  condition  has  to  be  satisfied  by  the 
selection  procedure.   To  have  practical  selection  rules,  those  overlaps 
should  contain  numbers,  simple  in  the  binary  sense,  so  that  low  precision 
operations  can  be  used  in  implementation.   The  results  of  the  complete 
derivation  of  the  selection  rules,  given  in  [9],  are  used  in  the  following 
algorithms.  The  selection  rules  become  simple  after  the  first  three  steps, 
since  the  correspondence  between  intervals  and  S  ' s  is  given  by 


(-2S  -1)  <  321^  <  (-2S  +1),   3  <  k  < 


m 


(2.45 


indicating  that  selection  can  be  performed  by  rounding  the  scaled  remainder 
to  one  non-sign  digit.   Knowing  this,  we  specify  rules  for  k  =  0, 1  and  2 
through  modified  rounding  rather  than  using  a  table  look-up  or  a  direct 
combinational  approach. 

We  now  give  the  following  definitions,  relevant  to  the  description 
of  the  algorithms. 

Sign  and  magnitude  representation  of  the  constants  S^: 

S, 


(l-2s,  )l  s.2  ,    s.e{o,  1}  for  all  i; 
i=o 


Two's  complement  representation  of  scaled  remainders: 

km 
R,  =  -ro  +  Z   r.2~  ,    r.e[o,  1}  for  all  i; 
i=l 


Truncated  scaled  remainder: 

£         6    -i 
R  =  -ro  +  2  r.2 

i=l 


(2.5) 


(2.6) 


(2.7) 


Non-sign  part  of  ll  : 


\=< 


f     6    -i 

Z   r.2  "     if  r  =  0; 
.  -,  1  o 

i=l 


i=l 


r.2-1 
1 


if  r  =  1: 
o 


(2.8) 


Step-dependent  rounding  constant: 


U     =  Z  u.2~  ,      u.e[o,l)   and 

i=l 


(2.9) 


\  -  * A> 


Algorithm  N  (Multiplicative  normalization) :  (2.10) 

Step  HI.   [Initialize] 


k  <- 

o; 

S 
o 

<-  1  : 

If 

1/2  <  X 
'      —     o 

< 

5/8; 

S 
o 

«-  o  : 

Lf 

5/8  <  X 

'      —     o 

< 

i; 

Rl 

«-  X 
o 

(l+So)   ■    1: 

i 

for 

k  < 

m 

perform: 

k  ♦- 

k  + 

1; 

Step  W2.   [Loop] 


Sk  *"  L  (Tk+Uk)l6j;  Slgn  Sk^Sign^; 
if  k  <  k  then 

\+i  - 16\ +  \  +  l6~k+1W 

else : 

where  k     -   (m+3)/2.      [  YJ    denotes  largest  integer  not  larger  than  Y,    and  the 
rounding   constant  U,    is   defined  as  follows : 

Ul  =   U2  =   °5 
u5  =  Krro.?2 

U5  =   VVSV    +  K2   [ro+?l(?2+S)    +  r6]    +  K 

U6  ■  KlVlf  +  K2ro(rl+r2+r5) 
where  K  ,    K,    and  K  stand  for  k=l,    k=2  and  k  >  3  respectively.      This 
simplification  is   due  to  finite  precision  and  the  decreasing  effect  of  the 
term  16         S^^R^  Dn  \+y      It  is  an  interesting  feature  of  this  procedure   that 


at  every  step  an  increasing  number  of  constants  S  is  known.   This  constitutes 
the  basis  for  a  possible  variable  radix  approach. 


3-  Additive  Normalization 

The  additive  normalization,  as  defined  by  (1.2)  is  nothing  more  than 
a  right  directed  recoding   where  one  replaces  the  non-redundant  digit  set 
[0,  ...,15}   with  a  redundant  one  [10,  ...,10}.   The  procedure  is  simple  and 
exact.   Rounding,  as  a  selection  rule,  applies  to  all  steps.   The  scaled 
remainder  is  as  before  (2.2),  and  since 

\+1  =  \  '   l6"\>    o  <  k  <  m        (3.1) 
the  basic  remainder  recursion  is 


R^+1   =  161^  -  Sk,      o  <  k  <  m        (3.2) 


and  I  R   |  <  2/3. 


Using  the  definitions    (2.5-2.9)  we  give 

Algorithm  A   (Additive  normalization)  (3»3) 

Step  Al.      [Initialize]  k  «-  o; 


Step  A2.      [Loop] 


s     • 
0 

-  i; 

V 

0 

- 

V 

for 

k  < 

m 

perform: 

k  «- 

k  + 

1: 

i 

sk  *■ L  (Tk  +  uk)16j ;  Sign  sk  *- Sign  Rk; 

\+l  -  16Rk  -  sk' 


6    -i 

where  U.  =  z   u.2    and 
k  .  ,  1 
i=l 


u.±   =  o,  i  f   5 


.  -1 


k.     Division 


Let  Q  =  Yq/X  ,   where  X  ,  Y  e[l/2,l).  Then  consider 

y       YQ  ft  Mi) 

o     o  7r(Mi) 
i=o 


where 

^  =  1  +  Sk.l6"k,    o  <  k  <  m 

If  XQ7r(Mi)  ->  1,    then  Y^Mi)  ->  Q,  indicating  a  possible  algorithm. 
Constants  S^  can  be  obtained  through  multiplicative  normalization 
(Algorithm  N)  and,  if  one  recursively  defines  partial  result  as 

%+i  -  \(1+skl6"k)  ^-2> 

o  <  k  <  m 

the  quotient  Q,  =  Q,   ,  with  m  correct  digits  can  be  simultaneously  evaluated 
in  a  second  arithmetic  unit. 

Algorithm  D  (Division)  (^«3) 

(AU1:  Normalization)    (AU2:  Result  evaluation) 

Step  Dl.   [Initialize]     k  <-  o; 

Step  Nl  of  Alg  N;  QQ  «-  Yq; 

Step  D2.   [Loop]      for  k  <  m  perform: 

Step  N2  of  algorithm  N;  9^+1   «-  Q.  +  Q^S^"1*1; 

An  example  is  given  in  Figure  lj--l.  The  implementation  is  shown  in  Figure  k-2, 
without  control  details. 
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Conventional  multiplication  techniques  have  been  made  compatible 
■with  continued  sums  approach,  using  additive  normalization  [9]« 


5.   Logarithm 

E 
Let  X  =  X  2  '  be  a  floating  point  number  with  fractional  part 

X  e[l/2,  l)  and  exponent  E  f 

Then 

fo  X  =  fo  X  +E  fo  2  (5.1) 

ox  w   / 

To  find  an  algorithm  which  computes  feX  ,  one  can  consider 

X     =  X    TM./f  M. 
o  o.        1'  .        1 

1=0       1=0 

where  It    =  1  +  l6"kSk- 

Then, 

foX     =  fo  (X     ttM.  )-§fo(M)  (5.2) 

o  o.        1        .  i  v        ' 

1=0  1=0  -1- 

Since    j    X    ir  M.    -  1    |<  2/3  16     ,    as   described  in  Section  2, 
i=o 

foX     =-?    fo  (1  +  S.l6-i)  (5.3) 

0.1  ^ 

1=0 

Therefore,  to  compute  fo  X  one  needs  to  perform  a  summation  of  precomputed 

constants  of  the  f orm  fo  (l  +  S  16"  )  stored  in  a  fast  read-only  memory  (ROM). 

S  ' s,  obtained  through  multiplicative  normalization,  serve  as  keys  to  access 

corresponding  constants.   Clearly,  the  logarithm  for  any  base  can  be 

realized,  depending  on  the  stored  constants.  We  consider  the  natural 

logarithm,  since  the  same  set  of  stored  constants  can  be  used  for  the 

exponential  algorithm.0   The  summation  (5*3)  is  performed  recursively: 


\+l   =  \  ~  ^   (1+Skl6~k)>     o  <  k  <  m        (5.*0 


L  =  0 
o 


12 

The  result  is  correct  to  m  digits,  if  one  assumes  that  the  stored  constants 
are  exactly  represented  with  m  digits— which  is  not  the  case.   Therefore, 
either  extended  precision  or  smaller  accuracy  has  to  he  accepted.   On  the 
other  hand,  the  actual  requirement  for  ROM  capacity  is  decreased  due  to 
finite  precision.   From  the  power  series  expansion  for  the  logarithm,  it  can 
he  shown  that  for 

k  >  k    =   (2  foglO-l+V)/8  «  (5.5+^m)/8  (5-5) 

2m  (1+S.  l6"k)    =    S.  l6"k 
k  k 

thereby  reducing  the  required  capacity  by  approximately  one-half.  Evaluation 
of  the  term  E  2m2  does  not  present  a  problem  E  being  of  short  length  and  it 
is  not  considered  here.   Constant  2m  2  can  be  kept  in  ROM  together  with  other 
constants. 

Algorithm  L  (Logarithm)  (5-6) 

(AU1:  Normalization)    (AU2:  Result  evaluation) 
Step  LI.   [Initialize]      k  «-  o; 

Step  Nl  of  Alg.  N;  L  *-   o; 

Step  L2.  [Loop]  for  k  <  m  perform: 

Step  N2  if  k  <  k-  then: 

i^^-Mi^ie-*); 

else: 
Implementation  and  an  example  are  shown  on  Figures  5-2  and  5-1,  respectively. 
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6.  Exponential 

X 
To  evaluate  e  ,  some  preliminary  transformations  are  necessary  [1]. 

Consider 

e  =  e  (6.1) 

Let  Xfog^e  =  I  +  F  where  I  and  F  denote  the  integer  and  fractional 
parts,    respectively.     Then, 

X       JF  2m  2       Jxo 


e 


=  2  e    "'  c  =  2   eA"  (6.2) 


X  o  T 

and  the  problem  is  now  evaluation  of  e  ,  where  X  =  F  2m  2.   The  factor  2 

is  easily  incorporated  into  the  exponent  part  of  the  result.   To  simplify 

the  initial  step,  we  restrict  F  to  negative  values,  assuming  that  I  is 

increased  by  one  when  F  >  o  and  F  is  replaced  by  1  -  F.  Then  Xe(-fe2,  o] 

Once  again  a  convenient  identity  is  used: 

Xo    Xo  -  2m   (  7T   M.  )  +  2m   (  f   M.  )  (6-3) 

e=e  l       .1 

i=o        i=o 


where  M=  1  +  S  l6~  ,  o  <  k  <  m. 


m 


If  X  -  in   (  tt  M. )  -*  o,  then 
o      .    1 
i=o 

m  .,     Xo 
TT  M.  ->  e 
l 

1=0 

The  recursion  for  the  result  evaluation  is  simple 

E.,-  =  E.  (1+S.  l6"k),    o  <  k  <  m  (6.k) 

k+1    k    k    '  —      — 

E  =  1 
o 

while  the  scaled  remainder  recursion  for  additive  normalization  becomes: 
E^   =  16R  -  l6k  2m   (1+Skl6"k),   o  <  k  <  m         (6.5) 
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The  required  set  of  precomputed  constants  is  the  same  as  the  one 
for  logarithm.   In  this  case  also  for  k  >  k  (6.5)  no  constants  should  be 
stored  and  normalization  is  defined  by  Algorithm  A.   The  derivation  of  the 
selection  rules,  given  in  [9]  will  be  omitted.   The  initial  step  is  done 
according  to  the  following  table: 

TABLE  7.1 

xo  Mo  ton.  Mo 


[-l/8,o]  1  o 

[ -3/8, -1/8)         -1/k  -1/k 

(-0*2,-3/8)        -17/32  -17/32 

Then  by  proper  restrictions  of  possible  PL  -  ranges  the  validity  of 
the  rounding  rule  is  preserved  in  all  remaining  steps.  Values  for  Mo  and 
ton  Mo  should  be  stored  in  the  ROM. 

X 

We  summarize   evaluation  of  e     as   follows: 
Preliminary  transformations: 

a)  I  +  F  =  xfa^e 

b )  X     =  F  ton  2 

o 

Algorithm  E  (Exponential)  (6.6) 

(AU1:  Normalization)     (AU2:  Result  evaluation) 

Step  El.   [Initialize]       k  «-  o; 

R,  «-  x  -ton  Mo:  E_  <-  Mo; 

1    o  J. 


Step  E2.   [Loop]        for  k  <  m  perform:  k  <-  k  +  1; 

if  k  <  kx  then  Ek+1  «-  Ek  +  iyy.6"* 

Sk  *"  L(Tk+Uk)l6j;  8ign  Sk  *~  Sign  V 

Rk+1  *"  l6Rk  "  ^  *  (1+skl6"k^ 
else: 

Step  A2  of  Alg  A; 


IT 

where  U  =  1/32  and  k  is  determined  by  (5*5) • 

An  example  and  implementation  are  shown  on  Figures  6-1  and  6*2 
respectively. 

7.  Implementation 

Here  we  discuss  briefly  some  basic  aspects  of  the  radix  l6 

implementation,  comparing  them  with  those  of  the  radix  2  case,  and  omitting 

details  related  to  a  particular  design.   The  general  configuration  for  the 

described  algorithms  contains  two  arithmetic  units,  operating  simultaneously. 

The  main  parts  are  the  same:  the  adder  structure  with  the  multiple  formation 

networks,  the  shifting  network  and  the  argument  register.   The  adder  structure 

for  radix  16  requires  two  adders  and  two  select-complement  networks,  which 

represent  a  major  increase  in  hardware  compared  to  radix  2.  The  speed  of 

addition  will  be  only  slightly  decreased  if  both  adders  are  unified  into 

one  three-digit  adder.  We  estimate  that  this  part  will  require  twice  as 

much  hardware  as  the  corresponding  part  in  the  radix  2  case.  If  the  add 

time  of  the  adder  in  radix  2  is  t  n,  we  assume  that  t  n/-  <  1.2  t  ^,  for 

a2'  alb       a2' 

sufficiently  large  m.  The  shifting  network,  required  to  shift  right/left 
k-digits,  for  o  <  k  <  m-1,  is  simpler  for  higher  radix.  We  assume  that  the 
shifting  network  is  realized  using  a  "barrel  switch"  technique  [7]«  Namely, 
shifting  is  performed  in  two  or  more  levels  so  that  the  combination  of  level 
shifts  corresponds  to  the  required  shift.  Left  shift  is  performed  using  the 
same  data  paths,  only  the  two's  complement  of  the  number  specifying  the 
shift  is  given  as  the  control.  We  assume  that  radix  2  requires  30$  more 
hardware  than  radix  16.   (For  example,  if  m  =  kQ,    then  level  1  provides 
displacements  of  0,  16  or  32  positions,  level  2  provides  displacements  of' 
0,  h,    8,  or  12  positions  and,  in  the  radix  2  case,  level  3  would  be  necessary 
with  displacements  0,  1,  2,  or  3  positions.)  Speedwise,  t  ,   >  1.3  t  ^n^« 
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Furthermore,  the  selection  procedure  for  radix  2  requires  only  low  precision, 
k   bits  long  comparison,  while  in  radix  l6  five  simple  equations  and 
comparisons  of  7  hits  are  required.   In  both  cases  then,  the  selection 
block   requirements  are  neglected. 

To  obtain  some  approximate  comparisons  between  those  two  implementations, 
we  first  define  the  measure  of  the  performance  as  the  number  of  bits  of  the 
result  per  gate  level  delay.   In  general,  for  radix  r,  the  performance 
P  =  fo^r/Tr,  where  Tr  is  the  total  delay  necessary  to  evaluate  8o^r  bits  of 
the  result  [k] .  Assuming  that  the  add  time  is  dominant  over  control,  S 
select  and  shift  time  in  all  steps,  then  T-,/-  -  3Tp,  since  in  radix  2  the 
probability  of  a  zero  (po=2/3)  is  exploited  by  providing  an  adder  bypass. 
Therefore,  P .//P  =  h/3   on  the  average. 

To  estimate  the  efficiency  of  the  implementation  we  consider  the  ratio 
between  performance  and  cost  per  bit.  With  previous  assumptions,  we  obtain 
that  E-^/E  =  1.   If  ROM  capacity  is  taken  into  account  then  radix  2  offers 
more  efficient  design,  but  radix  l6  maintains  better  performance.   It  should 
be  noted  that  radix  16  implementation  offers  constant  execution  time,  while 
radix  2  has  a  variable  one. 

The  need  for  two  arithmetic  units  might  be  considered  as  a 
disadvantage.  We  outline  here  how  the  performance  can  be  maintained  even 
if  only  one  arithmetic  unit  is  used.  We  first  recall  that  there  are  two 
processes,  namely  normalization  and  result  evaluation,  and  that  for  both 
appearance  of  the  result  is  essentially  determined  by  the  carry  propagation. 
The  sequential  organization,  shown  in  Figure  7-1,  eliminates  one  arithmetic 
unit  by  interleaving  two  processes.   But  the  performance  is  also  reduced, 
i.e.,  the  execution  is  around  two  times  slower.   One  way  to  preserve  double 
unit  performance  while  using  only  one  arithmetic  unit,  Is  to  "pipeline"  two 
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processes  by  partitioning  adder  structure  and  data  paths  into  separately 
controlled  groups  and  then  to  overlap  processes.  For  example,  let  there  be 
n  groups  of  m/n  digits,  and  let  superscript  denote  the  part  processed  by 
group.   Then,  as  soon  as  the  group  G.  has  finished  generating  the  result 

part  x,  ,,  it  can  be  transferred  into  register  R.,  (Figure  7-2)  and  part 

i  i-1 

q  can  be  processed  while  group  gives  processes  part  x   ,  etc.   In  other 

words,  result  parts  for  two  processes  x,    and  q    are  simultaneously 

obtained.  After  part  x,  _  has  been  computed,  S    can  be  determined  and 

a  new  cycle  initiated.   The  "pipeline"  will  work  with  maximal  efficiency, 

since  both  processes  are  always  present.  Although  there  will  be  extra 

control,  the  reduction  to  one  arithmetic  unit  with  a  slight  decrease  in 

speed  may  yield  the  result  that  the  "pipeline"  approach  is  the  optimal  one. 

Conclusion 

The  algorithms  based  on  continued  products  (sums)  provide  simplicity 
and  uniformity,  important  for  efficient  implementation  of  a  wider  class  of 
elementary  functions.  The  available  technology  could  easily  justify 
realization  of  arithmetic  units  also  evaluating  the  elementary  functions, 
usually  implemented  in  software.   This  approach,  without  significantly 
affecting  relative  cost  of  arithmetic  unit  design,  might  have  an  advantage 
in  multiprogramming  and  multiprocessing  systems. 

The  radix  l6  approach,  described  here,  offers  faster  implementation 
than  the  radix  2.   The  selection  rules,  the  major  problem  in  a  higher  radix, 
remain  relatively  simple.  A  useful  property  of  the  recursion  based  on  the 
continued  products  could  be  exploited  in  even  higher  radix  implementation, 
thus  yielding  further  speed-up.  Namely,  after  performing  initial  steps 
(k  <  3),  not  only  constant  Sfc  but  also  constants  Sk+1,  •••>s2k_3  can  be 
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determined  at  the  step  k.   One  could  also  think  about  a  variable  radix 
approach,  performing  the  initial,  most  difficult  steps  in  some  low  radix 
and,  as  the  selection  process  becomes  easier,  increasing  the  radix  at  every 
step. 

Whether  square  root,  trigonometric  and  inverse  trigonometric 
functions  can  be  easily  included  in  radix  16  approach,  remains  to  be 
determined  by  finding  corresponding  selection  rules,  but  it  is  believed 
that  this  is  possible. 
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